Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision

نویسندگان

چکیده

Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language a derivative language, and its resources are scarce noisy. Moreover, it difficult to find bilingual resource utilize the linguistic knowledge of other large languages, such as Chinese or English. There little related research on unsupervised Chinese-Uyghur existing methods mainly focus term based translated parallel corpora. Accordingly, effective, This paper proposes method extract dictionary by combining inter-word relationship matrix mapped neural network cross-language word embedding vector. A seed used weak supervision signal. small data map multilingual vectors into unified vector space. As word-particles these two not well-coordinated, stems main particles. strong semantic associate information. Two retrieval indicators, nearest neighbor cross-domain similarity local scaling, calculate dictionaries. experimental results show accuracy proposed in this improved 65.06%. helps improve machine translation, automatic extraction, translations.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual Lexicon Extraction From Internet

This paper introduces an experimental system which can extract translations of words and phrases from the Internet through alignment on parallel WWW pages. The automatic extraction takes place online, is language independent and incrementally formed after a post-editing step by a human being. Actually the experimental system can extract words and phrases between pairs of the languages English, ...

متن کامل

Corpus-Driven Bilingual Lexicon Extraction

This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work. 1 Machine T...

متن کامل

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...

متن کامل

Low-resource bilingual lexicon extraction using graph based word embeddings

In this work we focus on the task of automatically extracting bilingual lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting where only a small amount of parallel corpus is available. Most of the downstream methods do not work well under low-resources conditions. This is specially true for the approaches that use vectorial representations like Word2Vec. Our proposal is ...

متن کامل

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is h...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information

سال: 2022

ISSN: ['2078-2489']

DOI: https://doi.org/10.3390/info13040175